eric horvitz
Challenges in Human-Agent Communication
Bansal, Gagan, Vaughan, Jennifer Wortman, Amershi, Saleema, Horvitz, Eric, Fourney, Adam, Mozannar, Hussein, Dibia, Victor, Weld, Daniel S.
Remarkable advancements in modern generative foundation models have enabled the development of sophisticated and highly capable autonomous agents that can observe their environment, invoke tools, and communicate with other agents to solve problems. Although such agents can communicate with users through natural language, their complexity and wide-ranging failure modes present novel challenges for human-AI interaction. Building on prior research and informed by a communication grounding perspective, we contribute to the study of \emph{human-agent communication} by identifying and analyzing twelve key communication challenges that these systems pose. These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication. We illustrate each challenge through concrete examples and identify open directions of research. Our findings provide insights into critical gaps in human-agent communication research and serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Nori, Harsha, Usuyama, Naoto, King, Nicholas, McKinney, Scott Mayer, Fernandes, Xavier, Zhang, Sheng, Horvitz, Eric
Run-time steering strategies like Medprompt are valuable for guiding large language models (LLMs) to top performance on challenging tasks. Medprompt demonstrates that a general LLM can be focused to deliver state-of-the-art performance on specialized domains like medicine by using a prompt to elicit a run-time strategy involving chain of thought reasoning and ensembling. OpenAI's o1-preview model represents a new paradigm, where a model is designed to do run-time reasoning before generating final responses. We seek to understand the behavior of o1-preview on a diverse set of medical challenge problem benchmarks. Following on the Medprompt study with GPT-4, we systematically evaluate the o1-preview model across various medical benchmarks. Notably, even without prompting techniques, o1-preview largely outperforms the GPT-4 series with Medprompt. We further systematically study the efficacy of classic prompt engineering strategies, as represented by Medprompt, within the new paradigm of reasoning models. We found that few-shot prompting hinders o1's performance, suggesting that in-context learning may no longer be an effective steering approach for reasoning-native models. While ensembling remains viable, it is resource-intensive and requires careful cost-performance optimization. Our cost and accuracy analysis across run-time strategies reveals a Pareto frontier, with GPT-4o representing a more affordable option and o1-preview achieving state-of-the-art performance at higher cost. Although o1-preview offers top performance, GPT-4o with steering strategies like Medprompt retains value in specific contexts. Moreover, we note that the o1-preview model has reached near-saturation on many existing medical benchmarks, underscoring the need for new, challenging benchmarks. We close with reflections on general directions for inference-time computation with LLMs.
A Computational Inflection for Scientific Discovery
Hope, Tom, Downey, Doug, Etzioni, Oren, Weld, Daniel S., Horvitz, Eric
We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally -- including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and collaboration and communication platforms. The transition has led to the creation and growth of a tremendous amount of information -- much of which is available for public access -- opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in artificial intelligence, including large neural language models capable of learning powerful representations from unstructured text. Dramatic changes in scientific communication -- such as the advent of the first scientific journal in the 17th century -- have historically catalyzed revolutions in scientific thought. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.
On the Horizon: Interactive and Compositional Deepfakes
Over a five-year period, computing methods for generating high-fidelity, fictional depictions of people and events moved from exotic demonstrations by computer science research teams into ongoing use as a tool of disinformation. The methods, referred to with the portmanteau of "deepfakes," have been used to create compelling audiovisual content. Here, I share challenges ahead with malevolent uses of two classes of deepfakes that we can expect to come into practice with costly implications for society: interactive and compositional deepfakes. Interactive deepfakes have the capability to impersonate people with realistic interactive behaviors, taking advantage of advances in multimodal interaction. Compositional deepfakes leverage synthetic content in larger disinformation plans that integrate sets of deepfakes over time with observed, expected, and engineered world events to create persuasive synthetic histories. Synthetic histories can be constructed manually but may one day be guided by adversarial generative explanation (AGE) techniques. In the absence of mitigations, interactive and compositional deepfakes threaten to move us closer to a post-epistemic world, where fact cannot be distinguished from fiction. I shall describe interactive and compositional deepfakes and reflect about cautions and potential mitigations to defend against them.
The Robot Brains Podcast: Eric Horvitz of Microsoft on AI for the greater good on Apple Podcasts
On Episode 15 of Season 2, we're joined by Eric Horvitz, Microsoft's first ever Chief Scientific Officer. His research spans theoretical and practical challenges with developing systems that perceive, learn, and reason. He's the company's top inventor since joining in 1993 with over 300 patents filed. He has been elected Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), Fellow of the National Academy of Engineering (NAE), Fellow of the American Academy of Arts and Sciences, and Fellow of the American Association for the Advancement of Science (AAAS). He was a member of the National Security Commission on AI and he also co-founded important groups like the Partnership on AI, a non-profit organization bringing together Apple, Amazon, Facebook, Google, DeepMind, IBM, and Microsoft to document the quality and impact of AI systems on things like criminal justice, the economy, and media integrity.
Human-AI Symbiosis: A Survey of Current Approaches
Zahedi, Zahra, Kambhampati, Subbarao
Also, we organize different In this paper, we aim at providing a comprehensive works in this area based on their knowledge and capability outline of the different threads of work in human-levels and their teaming goal perspectives. Then, we highlight AI collaboration. By highlighting various aspects how recent works can be categorized regarding these of works on the human-AI team such as the flow dimensions. of complementing, task horizon, model representation, knowledge level, and teaming goal, we make a taxonomy of recent works according to these dimensions.
What Does an AI Ethicist Do?
Microsoft was one of the earliest companies to begin discussing and advocating for an ethical perspective on artificial intelligence. The issue began to take off at the company in 2016, when CEO Satya Nadella spoke at a developer conference about how the company viewed some of the ethical issues around AI, and later that year published an article about these issues. Nadella's primary focus was on Microsoft's orientation toward using AI to augment human capabilities and building trust into intelligent products. The next year, Microsoft's R&D head Eric Horvitz partnered with Microsoft's president and chief legal officer Brad Smith to form Aether, a cross-functional committee addressing AI and ethics in engineering and research. With these foundations laid, in 2018, Microsoft established a full-time position in AI policy and ethics.
I don't fear the rise of super-intelligence: Eric Horvitz
Eric Horvitz is a technical fellow and director at Microsoft Research Labs. A recipient of the Feigenbaum and the Allen Newell Prizes for contributions to artificial intelligence (AI), he is also on the US President's Council of Advisors on Science and Technology, Defense Advanced Research Projects Agency, and the Allen Institute for Artificial Intelligence. He is also part of the standing committee of Stanford University's One Hundred Year Study on Artificial Intelligence. Horvitz, who comes at least once a year to the country to interact with the India labs team, spoke about his work at Microsoft Research. He also shared his thoughts on the benefits and fear of AI, and attempts to address the bias in algorithms.